1
Lesson 3 Introduction: Tackling Non-Linear Classification
EvoClass-AI002 Lecture 3
00:00

We are moving beyond the limitations of linear models, which struggle to classify data that is not separable by a straight line. Today, we apply the PyTorch workflow to build a Deep Neural Network (DNN) capable of learning complex, non-linear decision boundaries essential for real-world classification tasks.

1. Visualizing Non-Linear Data Necessity

Our first step is to create a challenging synthetic dataset, such as the two-moons distribution, to visually demonstrate why simple linear models fail. This setup forces us to use deep architectures to approximate the necessary intricate curve separating the classes.

Data Properties

  • Data Structure: Synthetic data features (e.g., $1000 \times 2$ for $1000$ samples with 2 features).
  • Output Type: A single probability value, often torch.float32, representing class membership.
  • Goal: To create a curved decision boundary through layered computation.
The Power of Non-Linear Activations
The core principle of DNNs is the introduction of non-linearity in hidden layers via functions like ReLU. Without these, stacking layers would simply result in one large linear model, regardless of depth.
data_setup.py
TERMINAL bash — classification-env
> Ready. Click "Run" to execute.
>
TENSOR INSPECTOR Live

Run code to inspect active tensors
Question 1
What is the primary purpose of the ReLU activation function in a hidden layer?
Introduce non-linearity so deep architectures can model curves
Speed up matrix multiplication
Ensure the output remains between 0 and 1
Normalize the layer output to a mean of zero
Question 2
Which activation function is required in the output layer for a binary classification task?
Sigmoid
Softmax
ReLU
Question 3
Which loss function corresponds directly to a binary classification problem using a Sigmoid output?
Binary Cross Entropy Loss (BCE)
Mean Squared Error (MSE)
Cross Entropy Loss
Challenge: Designing the Core Architecture
Integrating architectural components for non-linear learning.
You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).
Step 1
Describe the flow of computation for a single hidden layer in this DNN.
Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.
Step 2
What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?
Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.